Review of methods for handling confounding by cluster and informative cluster size in clustered data
نویسندگان
چکیده
Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland.
منابع مشابه
Cluster adjusted regression for displaced subject data (CARDS): Marginal inference under potentially informative temporal cluster size profiles.
Ignorance of the mechanisms responsible for the availability of information presents an unusual problem for analysts. It is often the case that the availability of information is dependent on the outcome. In the analysis of cluster data we say that a condition for informative cluster size (ICS) exists when the inference drawn from analysis of hypothetical balanced data varies from that of infer...
متن کاملMarginal association measures for clustered data.
The use of correlation coefficients in measuring the association between two continuous variables is common, but regular methods of calculating correlations have not been extended to the clustered data framework. For clustered data in which observations within a cluster may be correlated, regular inferential procedures for calculating marginal association between two variables can be biased. Th...
متن کاملEfficient Estimation Methods for Informative Cluster Size Data
Based on clustered data with informative cluster size, two efficient estimation methods are proposed for marginal models. In our procedures, the information of within-cluster correlation and minimum cluster size is fully used; this is not the case with the within-cluster re-sampling (WCR) and cluster-weighted generalized estimating equation (CWGEE) methods. When the correlation model is valid a...
متن کاملAnalysis of clustered data when the cluster size is informative
Clustered data arise in many scenarios. We may wish to fit a marginal regression model relating outcome measurements to covariates for cluster members. Often the cluster size, the number of members, varies. Informative cluster size (ICS) has been defined to arise when the outcome depends on the cluster size conditional on covariates. If the clusters are considered complete then the population o...
متن کاملMethods for Observed-Cluster Inference When Cluster Size Is Informative: A Review and Clarifications
Clustered data commonly arise in epidemiology. We assume each cluster member has an outcome Y and covariates X. When there are missing data in Y, the distribution of Y given X in all cluster members ("complete clusters") may be different from the distribution just in members with observed Y ("observed clusters"). Often the former is of interest, but when data are missing because in a fundamenta...
متن کامل